172 research outputs found

    Identification of microservices from monolithic applications through topic modelling

    Get PDF
    Microservices emerged as one of the most popular architectural patterns in the recent years given the increased need to scale, grow and flexibilize software projects accompanied by the growth in cloud computing and DevOps. Many software applications are being submitted to a process of migration from its monolithic architecture to a more modular, scalable and flexible architecture of microservices. This process is slow and, depending on the project’s complexity, it may take months or even years to complete. This paper proposes a new approach on microservice identification by resorting to topic modelling in order to identify services according to domain terms. This approach in combination with clustering techniques produces a set of services based on the original software. The proposed methodology is implemented as an open-source tool for exploration of monolithic architectures and identification of microservices. A quantitative analysis using the state of the art metrics on independence of functionality and modularity of services was conducted on 200 open-source projects collected from GitHub. Cohesion at message and domain level metrics’ showed medians of roughly 0.6. Interfaces per service exhibited a median of 1.5 with a compact interquartile range. Structural and conceptual modularity revealed medians of 0.2 and 0.4 respectively. Our first results are positive demonstrating beneficial identification of services due to overall metrics’ resultsNational Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project UIDB/50014/202

    A Backend Platform for Supporting the Reproducibility of Computational Experiments

    Full text link
    In recent years, the research community has raised serious questions about the reproducibility of scientific work. In particular, since many studies include some kind of computing work, reproducibility is also a technological challenge, not only in computer science, but in most research domains. Replicability and computational reproducibility are not easy to achieve, not only because researchers have diverse proficiency in computing technologies, but also because of the variety of computational environments that can be used. Indeed, it is challenging to recreate the same environment using the same frameworks, code, data sources, programming languages, dependencies, and so on. In this work, we propose an Integrated Development Environment allowing the share, configuration, packaging and execution of an experiment by setting the code and data used and defining the programming languages, code, dependencies, databases, or commands to execute to achieve consistent results for each experiment. After the initial creation and configuration, the experiment can be executed any number of times, always producing exactly the same results. Furthermore, it allows the execution of the experiment by using a different associated dataset, and it can be possible to verify the reproducibility and replicability of the results. This allows the creation of a reproducible pack that can be re-executed by anyone on any other computer. Our platform aims to allow researchers in any field to create a reproducibility package for their science that can be re-executed on any other computer. To evaluate our platform, we used it to reproduce 25 experiments extracted from published papers. We have been able to successfully reproduce 20 (80%) of these experiments achieving the results reported in such works with minimum effort, thus showing that our approach is effective

    Data curation: towards a tool for all

    Get PDF
    Data science has started to become one of the most important skills one can have in the modern world, due to data taking an increasingly meaningful role in our lives. The accessibility of data science is however limited, requiring complicated software or programming knowledge. Both can be challenging and hard to master, even for the simple tasks. With this in mind, we have approached this issue by providing a new data science platform, termed DS4All.Curation, that attempts to reduce the necessary knowledge to perform data science tasks, in particular for data cleaning and curation. By combining HCI concepts, this platform is: simple to use through direct manipulation and showing transformation previews; allows users to save time by eliminate repetitive tasks and automatically calculating many of the common analyses data scientists must perform; and suggests data transformations based on the contents of the data, allowing for a smarter environment.This work is financed by National Funds through the Portuguese funding agency, FCT - Fundação para a Ciência e a Tecnologia within project UIDB/50014/2020

    Automatically inferring ClassSheet models from spreadsheets

    Get PDF
    Many errors in spreadsheet formulas can be avoided if spreadsheets are built automatically from higher-level models that can encode and enforce consistency constraints. However, designing such models is time consuming and requires expertise beyond the knowledge to work with spreadsheets. Legacy spreadsheets pose a particular challenge to the approach of controlling spreadsheet evolution through higher-level models, because the need for a model might be overshadowed by two problems: (A) The benefit of creating a spreadsheet is lacking since the legacy spreadsheet already exists, and (B) existing data must be transferred into the new model-generated spreadsheet.To address these problems and to support the modeldriven spreadsheet engineering approach, we have developed a tool that can automatically infer ClassSheet models from spreadsheets. To this end, we have adapted a method to infer entity/relationship models from relational database to the spreadsheets/ClassSheets realm. We have implemented our techniques in the HAEXCEL framework and integrated it with the ViTSL/Gencel spreadsheet generator, which allows the automatic generation of refactored spreadsheets from the inferred ClassSheet model. The resulting spreadsheet guides further changes and provably safeguards the spreadsheet against a large class of formula errors. The developed tool is a significant contribution to spreadsheet (reverse) engineering, because it fills an important gap and allows a promising design method (ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal effort.(undefined

    Discovery-based edit assistance for spreadsheets

    Get PDF
    Spreadsheets can be viewed as a highly flexible endusers programming environment which enjoys wide-spread adoption. But spreadsheets lack many of the structured programming concepts of regular programming paradigms. In particular, the lack of data structures in spreadsheets may lead spreadsheet users to cause redundancy, loss, or corruption of data during edit actions. In this paper, we demonstrate how implicit structural properties of spreadsheet data can be exploited to offer edit assistance to spreadsheet users. Our approach is based on the discovery of functional dependencies among data items which allow automatic reconstruction of a relational database schema. From this schema, new formulas and visual objects are embedded into the spreadsheet to offer features for auto-completion, guarded deletion, and controlled insertion. Schema discovery and spreadsheet enhancement are carried out automatically in the background and do not disturb normal user experience

    A type-level approach to component prototyping

    Get PDF
    Algebraic theories for modeling components and their interactions offer abstraction over the specifics of component states and interfaces. For example, such theories deal with forms of sequential composition of two components in a manner independent of the type of data stored in the states of the components, and independent of the number and types of methods offered by the interfaces of the combinators. General purpose programming languages do not offer this level of abstraction, which implies that a gap must be bridged when turning component models into implementations. In this paper, we present an approach to prototyping of component-based systems that employs so-called type-level programming (or compile-time computation) to bridge the gap between abstract component models and their type-safe implementation in a functional programming language. We demonstrate our approach using Barbosa’s model of components as generalized Mealy machines. For this model, we develop a combinator library in Haskell, which uses typelevel programming with two effects. Firstly, wiring between components is computed during compilation. Secondly, the well-formedness of the component compositions is guarded byHaskell’s strong type system.Fundação para a Ciência e a Tecnologia, Portugal, under grant number SFRH/BD/30231/2006

    From spreadsheets to relational databases and back

    Get PDF
    This paper presents techniques and tools to transform spreadsheets into relational databases and back. A set of data refinement rules is introduced to map a tabular datatype into a relational database schema. Having expressed the transformation of the two data models as data refinements, we obtain for free the functions that migrate the data. We use well-known relational database techniques to optimize and query the data. Because data refinements define bidirectional transformations we can map such database back to an optimized spreadsheet. We have implemented the data refinement rules and we have constructed tools to manipulate, optimize and refactor Excel-like spreadsheets.(undefined

    Get your spreadsheets under (version) control

    Get PDF
    Spreadsheets play a pivotal role in many organizations. They serve to store and manipulate data or forecasting, and they are often used to help in the decision process, thus directly impacting the success, or not, of organizations. As the research community already realized, spreadsheets tend to have the same problems “professional” software contain. Thus, in the past decade many software engineering techniques have been successfully proposed to aid spreadsheet developers and users. However, one of the most used mechanisms to manage software projects is still lacking in spreadsheets: a version control system. A version control system allows for collaborative development, while also allowing individual developers to explore different alternatives without compromising the main project. In this paper we present a version control system, named SheetGit, oriented for end-user programmers. It allows to graphically visualize the history of versions (including branches), to switch between different versions just by pointing and clicking, and to visualize the differences between any two versions in an animated way. To validate our approach/tool we performed an empirical evaluation which shows evidence that SheetGit can aid users when compared to other tools.- (undefined
    corecore